Prediction of Antileishmanial Compounds: General Model, Preparation, and Evaluation of 2-Acylpyrrole Derivatives

In this work, the SOFT.PTML tool has been used to pre-process a ChEMBL dataset of pre-clinical assays of antileishmanial compound candidates. A comparative study of different ML algorithms, such as logistic regression (LOGR), support vector machine (SVM), and random forests (RF), has shown that the IFPTML-LOGR model presents excellent values of specificity and sensitivity (81–98%) in training and validation series. The use of this software has been illustrated with a practical case study focused on a series of 28 derivatives of 2-acylpyrroles 5a,b, obtained through a Pd(II)-catalyzed C–H radical acylation of pyrroles. Their in vitro leishmanicidal activity against visceral (L. donovani) and cutaneous (L. amazonensis) leishmaniasis was evaluated finding that compounds 5bc (IC50 = 30.87 μM, SI > 10.17) and 5bd (IC50 = 16.87 μM, SI > 10.67) were approximately 6-fold more selective than the drug of reference (miltefosine) in in vitro assays against L. amazonensis promastigotes. In addition, most of the compounds showed low cytotoxicity, CC50 > 100 μg/mL in J774 cells. Interestingly, the IFPMTL-LOGR model predicts correctly the relative biological activity of these series of acylpyrroles. A computational high-throughput screening (cHTS) study of 2-acylpyrroles 5a,b has been performed calculating >20,700 activity scores vs a large space of 647 assays involving multiple Leishmania species, cell lines, and potential target proteins. Overall, the study demonstrates that the SOFT.PTML all-in-one strategy is useful to obtain IFPTML models in a friendly interface making the work easier and faster than before. The present work also points to 2-acylpyrroles as new lead compounds worthy of further optimization as antileishmanial hits.


INTRODUCTION
Leishmaniasis is a parasitic disease, caused by Leishmania genus protozoan pathogens, that may present different clinical manifestations including cutaneous (CL), visceral or kala-azar (VL), post-kala-azar dermal leishmaniasis (PKDL), and mucocutaneous (MCL) leishmaniasis. As all neglected diseases, leishmaniasis remains a major global health problem as it is endemic in around 100 countries with more than 350 million people at risk. 1 Treatment of leishmaniasis relies mainly in a few drugs: pentavalent antimonials (ampB), paromomycin, pentamidine, liposomal amphotericin B, fluconazole, and miltefosine, depending on the etiological species, the infection type, and also the geographical region because of the increasing number of resistant strains. Additionally, the use of these drugs is associated with a number of severe side effects related to their toxicity. 2−5 Therefore, it is necessary to identify new effective antileishmanial compounds with chemotypes other than the ones in clinical use. In this context, nitrogen heterocycles can be considered privileged scaffolds because approximately 60% of U.S. FDA approved small-molecule drugs contain a nitrogen heterocycle. 6 In particular, the pyrrole core has attracted our attention because this motif is embedded in a variety of natural products (e.g., prodiginines, 7 bromopyrrole, 8 and spiroindimicin alkaloids 9 ) with antiparasitic activity. 10 Regarding synthetic derivatives, pyridinyl aryl pyrroles 1 and 2 have proven to be inhibitors of casein kinase 1 that block the growth of Leishmania major promastigotes in vitro. 11 1,2-Diarylpyrroles 3 have been identified as a new class of compounds active against the amastigote stay of Leishmania infantum by inhibiting the trypanothione reductase. 12 On the other hand, 2-acylpyrrole derivatives 4 also exhibited promising antileishmanial profiles ( Figure 1). 13 Additionally, it has been reported that pyrrole-indolinone SU11652, a sunitinib analog, targets the nucleoside diphosphate kinase from Leishmania parasites. 14 We recently synthesized 2acylpyrroles through Pd(II)-catalyzed radical C−H acylation of pyrrole derivatives. 15 This efficient and flexible protocol allowed us to collect a small library of 2-acylpyrroles 5, variably substituted on the aryl ring, and with a pyrimidine (series 5a) or pyridine (series 5b) ring linked to the nitrogen atom of the pyrrole nucleus ( Figure 1). These structural features make our pyrrole derivatives interesting candidates to be tested as potential antileishmanial compounds.
In this context, cheminformatic modeling can be a good option to reduce the development cost and increase the probability of finding new antileishmanial hits. Classic cheminformatic models focus on accelerating the antiparasitic drug discovery process by reducing the number of compounds to be assayed by trial-and-error tests. However, in addition to the large number of compounds to be tested, other factors may play a role slowing down this process. For example, the large number of combinations of biological parameters (MIC, IC 50 , pK i , etc.), parasite species, parasite stages, or target proteins greatly increases the time and cost per compound to be tested. Unfortunately, classic cheminformatic models fail to perform multiobjective optimization of antiparasitic compounds due to the difficulty of encoding multiple boundary conditions (parameter, protein, cell line, species, parasite stage, etc.) of assay and the need to obtain this information from many different data sources. We have recently reported the first PTML [(perturbation theory (PT) + machine learning (ML)] model that is capable of both explaining a very large dataset of preclinical assays of antileishmanial compounds and predicting the activity of new heterocycles (e.g., two series of pyrroloisoquinolines synthesized by our group) against different species of Leishmania. 16 Nevertheless, the development process of this first PTML model and its subsequent use for the prediction of new antileishmanial hits were laborious.
On the other hand, we have coined the term IFPTM [information fusion (IF) + perturbation theory (PT) + machine learning (ML)] for a new algorithm designed for multiobjective optimization of compounds. When the IF stage is missing, only the term PTML is used. 17,18 These IFPTML models have been used in medicinal chemistry, proteomics, metabolomics, and nanotechnology. 19,20 The first phase (IF + PT) of these IFPTML models consists of merging information from different sources and/or transforming the original variables into PT operators (PTOs). These PTOs are new input variables useful for encoding information about multiple assay conditions from different sources. For example, PTOs can be used to encode information about protein targets, cell lines, microbial metabolic networks of target organisms, nanoparticle carriers of the drug, etc. 19,20 Next, the IFPTML workflow enters into the ML phase using classic ML algorithms. Until recently, training PTML models required running different software for each stage of the algorithm (IF, PT, and ML), 19,20 as was the case for our previous IFPTML model for antileishmanial compounds. 16 A calculation sheet was needed to run the first phase, ML software to seek the model, and a new calculation sheet to run predictions. This problem drew the attention of cheminformatics software developers to the need for new platforms to unify the different steps of IFPMTL analysis. To this end, we have introduced the QSAR-Co tool that jointly runs the PT and ML stages of the algorithm. 21 However, QSAR-Co cannot run IF procedures to calculate multilabel PTOs or reference functions that encode multiple assay conditions at the same time. Furthermore, QSAR-Co only calculates one class of PT operators, called singlecondition moving averages. Consequently, it needs as many PTOs as boundary conditions are present in the problem, which implies a significantly higher number of variables to explore with respect to the multilabel PTOs used in IFPTML algorithms. 19,20 These PTOs have proven to be very useful in reducing the problem dimensionality, as in the case of ChEMBL antileishmanial pre-clinical assays dataset. 16 Therefore, we introduced the SOFT.PTML studio tool, which has the possibility of calculating multilabel PTOs, including multilabel/multicondition reference functions, moving averages, co-variances, etc. SOFT.PTML has been used successfully in nanotechnology and medicinal chemistry. 22,23 In the present work, we report for the first time the use of SOFT.PTML to seek IFPTML models for antileishmanial compounds, performing a comparative study of different ML algorithms. We have also carried out a predictive study of a series of 2-acylpyrroles, previously synthesized by our group, 15 together with experimental preparation of new samples for assay and their in vitro leishmanicidal testing. Some of the 2acylpyrroles tested compare favorably with respect to miltefosine (reference compound) in terms of activity and toxicity. This work opens a new experimental line of research focused on the synthesis and optimization of antileishmanial compounds as 2-acylpyrrole derivatives. It also lays the ground for the development of faster and user-friendlier IFPTML models for other neglected tropical diseases.
The general flowchart showing the interconnections between the different parts of this work: (1) chemoinformatics study, (2) organic synthesis, and (3) biological assays, is depicted in Figure  2.

Computational Methods. 2.1.1. IFPTML Model Basis.
An IFPTML model is proposed to calculate the values of the antileishmanial biological activity scoring function f(v ij ) calc of the ith query compound in the jth assay with multiple boundary conditions c j = [c 0 , c 1 , c 2 , c jmax ]. In these classification models, the f(v ij ) calc function, which gets dimensionless values, is used to score the propensity of the ith compound to reach a certain level of the biological activity values v ij (see next section). 24 Consequently, the values of f(v ij ) calc can be used directly to compare the relative propensity of two different compounds to reach a certain level of biological activity in the jth assay compared to a threshold value cutoff j . They can also be used to compare the behavior of the same compound in two different assays. The IFPTML model uses two types of input variables [functions of reference f(v ij ) ref and perturbation theory operators PTO k (c j )] to calculate the f(v ij ) calc output values. Thus, the IFPTML model starts with the values of a function of reference f(v ij ) ref , which are used to characterize/identify the kind of biological activity to be modeled. Next, the values of the PTO k (c j ) functions are used to measure the effect of perturbations over the biological activity outcome. PTO k (c j ) functions quantify perturbations/deviations in the structure of the ith compound and/or in the conditions assay c j compared to a set of reference compounds. 24 In the next section, the preprocessing of the raw data to construct the f(v ij ) ref and PTOk(c j ) functions is explained. IFPTML linear models have the following form (eq 1): (1) Figure 3 shows a workflow illustrating the integration of the different phases (IF + PT + ML) of the IFPTML analysis. First, the IF phase, which includes data collection, data rearrangement, and data fusion (horizontal and vertical), is run. The vertical IF involves aligning the output values v ij for different output parameters (IC 50 , K i , etc.) in the same column. These parameters are then transformed into a Boolean variable f(v ij ) obs using different cutoff values, (see next section). Horizontal IF involves merging multiple labels from pre-clinical assays to form multicondition label variables c j = [c 1 , c 2 , ... c j ]. Next, the PT phase is run by calculating the PTO(D k , c j ) operators that can encode structural information (D k ) and multiple assay conditions (c j ) at the same time. In the last phase, the different ML algorithms are run using the PTO(D k , c j ) values as input.
The following sections provide more details about the IFPTML steps.
2.1.2. Data Pre-processing and IF Output (Horizontal IF). The SOFT.PTML tool was used to pre-process a ChEMBL dataset of pre-clinical assays of antileishmanial compound candidates. 16 This tool was also used to perform training/ validation of alternative IFPTML models. These data included 145,851 biological activity values (v ij ) for the ith compounds tested on the jth preclinical assays with boundary experimental conditions (labels) c j = [c 0 , c 1 , c 2 , ... c n ]. The values v ij are expressed as different antileishmanial activity parameters with label c 0 , such as IC 50 , K i , EC 50 , etc. The SOFT.PTML discretization procedure was used in order to convert all the values of biological activity v ij into a Boolean objective function f(v ij ) obs = 1 or 0. The function was implemented in the software is f(v ij ) obs = 1 IF (v ij ≥ cutoff j AND d(c 0 ) = 1) OR (v ij ≤ cutoff j AND d(c 0 ) = −1) ELSE f(v ij ) obs = 0. In this equation, ≥ means higher than (>) or equal to (=) the cutoff. By analogy, ≤ means lower than (<) or equal to (=) the cutoff. The different cutoff j values are the cutoff values used for the different c 0 -labaled biological activity parameters (IC 50 , K i , EC 50 , etc.) in different jth assays. The desirability parameter d(c 0 ) = 1 or − 1 had to be maximized or minimized to obtain an optimal biological effect. This IFPTML model is multioutput, i.e., it can predict multiple output probability of activity p(f(v ij ) = 1) for the same ith drug entry. Consequently, to train the model, different values of the biological activity parameter v ij (K i , IC 50 , etc.) have been transformed into Boolean variables f(v ij ) = 1 OR 0. Since each biological activity parameter v ij (K i , IC 50 , etc.) has a different scale and optimal region, different values of cutoff were used for each parameter v ij . These values of cutoff j were determined by trying to balance both the optimal activity region and the number of cases in the two classes f(v ij ) = 1 OR 0 for each  On the other hand, the structural information is also expressed as a vector D ki = [D 1i , D 2i , D 3i , ...D maxi ]. The elements of this vector are: D 1i = ALOGP i , the n-octanol/water partition coefficient; D 2i = PSA i , the topological polar surface area, and D 3i = NVLR, the number of violations to Lipinski's rule of the structure of ith compound. The values of the molecular descriptors D k were downloaded from ChEMBL and/or calculated with the software DRAGON 25 for new compounds. Next, the IF process was carried out to calculate the multicondition PT operators PTO k (D ki , c j ). Each PTO k (D ki , c j ) variable is the expression of the fusion of structural information from one or multiple elements of D i and one or multiple elements of c j . Therefore, a PTO k (D ki , c j ) is a function or operator (PTO k ) calculated to merge structural information

SOFT.PTML In Silico Screening of New
Compounds. The new model was used to study a series of 28 di(hetero)aryl ketone derivatives (2-acylpyrroles) synthesized in our group. 16 First, the SMILE codes of the 28 compounds were generated using ChemDraw Professional 20.1. 26 Next, the values of the input variables D 1 = LOGP, D 2 = PSA, and D 3 = NVLR were calculated. Then, these values were substituted on the SOFT.PTML model in order to obtain the probabilities of activity for each compound in different biological assays. A simulation of the biological response of 28 compounds + 1 control (miltefosine) in many different preclinical assays was carried out. These assays included a total of >50 different biological activity parameters [K i (nM), IC 50 (nM), Inhibiton (%), etc.)], 35 target proteins (P00374 dihydrofolate reductase, Q0GKD7 farnesyl pyrophosphate synthase, etc.), 28 cell lines (J774, HL-60, Jurkat, etc.), 40 assay organisms (L. donovani, L. major, L. amazonensis, etc.), and 2 microorganism development stages (amastigotes and promastigotes). In total, the outcome of the 29 compounds in 249 different pre-clinical assays was predicted.
2.1.5. Data Sampling. In order to select the training and validation sets, we carry out the following steps. First, we ordered all assays according to the labels of output property c 0 (IC 50 , K i , etc.) and the discrete values of the two main partitions, c I and c II. After that, we assigned a value of set = train (t) or validation (v) from the beginning to the end following the pattern tttv. This allows us to use 75% (3/4) of the data for training the model and the remaining 25% (1/4) of the data for validation. The sorting of the cases by c 0 , c I , and c II ensures a Scheme 1. Synthesis of 2-Acylpyrroles 5a and 5b Screened against L. amazonensis and L. donovani Journal of Chemical Information and Modeling pubs.acs.org/jcim Article more representative distribution of all data strata or data subsets (properties, target proteins, pathogen species, etc.) in both training and validation series. The random sorting of all cases within each subset (same property label c 0 , or assay conditions c I and c II ) ensured a higher randomness of the sampling. In such a way, we have carried out one stratified, random, and representative data sampling. As result, we included a total of 6711 positive assays (f(v ij ) = 1) and 102,678 negative or control assays (f(v ij ) = 0) in training series. In addition, we included a total of 2050 positive assays (f(v ij ) = 1) and 34,212 negative or control assays (f(v ij ) = 0) in validation series.
2.3.2. In Vitro Promastigote Susceptibility Assay. The biological assay was carried out following previously published protocols. 27,28 Concisely, promastigotes (2.5 × 10 5 parasites/ well) from the log phase have been cultured in 96-well plastic plates. New samples of the compounds have been prepared according to the protocol described above, and subsequently solutions of the chemical compound to be assayed have been dissolved in DMSO at 50 mg/mL. We performed serial dilutions 1:2 of the compounds in fresh culture medium (100, 50, 25, 12.5, 6.25, 3.12, 1.56, and 0.78 μg/mL) up to a 200 μL final volume. Growth control and signal-to-noise control were also included. The final solvent (DMSO) concentrations never exceeded 0.5% (v/v) warranting no effect on parasite proliferation or morphology. After 48 h at 26°C, 20 μL of a 2.5 mM resazurin solution was added to each well and the plates were returned to the incubator for another 3 h.

Journal of Chemical Information and Modeling
pubs.acs.org/jcim Article used as the reference drug and was evaluated under the same conditions. The efficacy of each compound was estimated by calculating the IC 50 (concentration of the compound that produced a 50% reduction in parasites) using a multinomial probit analysis incorporated in SPSS software v21.0. The selectivity index (SI) was calculated as the ratio between cytotoxicity (CC 50 ) and activity against parasites (IC 50 ).

In Vitro Intracellular Amastigote Susceptibility Assay.
The assay was carried out as previously described. 29 Briefly, 5 × 10 4 J774 macrophages and stationary promastigotes in a 1:5 ratio were seeded in each well of a microtiter plate, suspended in 200 μL of culture medium, and incubated for 24 h at 33°C in a 5% CO 2 chamber. After this first incubation, the temperature was increased up to 37°C for another 24 h. Thereafter, cells were washed several times in culture medium by centrifugation at 1.500g for 5 min in order to remove free non-internalized promastigotes. Finally, the supernatant was replaced by 200 μL/well of culture medium containing 2-fold serial dilutions of the test compounds as in promastigotes assay. Growth control and signal-to-noise were also included. Following incubation for 48 h at 37°C and 5% CO 2 , the culture medium was replaced by 200 μL/well of the lysis solution (RPMI-1640 with 0.048% HEPES and 0.01% SDS) and incubated at room temperature for 20 min. Thereafter, the plates were centrifuged at 3.500g for 5 min and the lysis solution was replaced by 200 μL/well of Schneider's insect medium. The culture plates were then incubated at 26°C for another 4 days to allow transformation of viable amastigotes into promastigotes and proliferation. Afterward, 20 μL/well of 2.5 mM resazurin was added and incubated for another 3 h. Finally, fluorescence emission was measured and IC 50 was estimated as described above. All tests were carried out in triplicate. Miltefosine (Sigma-Merck, Madrid, Spain) was used as the reference drug and was evaluated under the same conditions. The IC 50 and SI were calculated as in the previous section.

Cytotoxicity Assay on Macrophages.
The assay was carried out as previously described. 30 J774 macrophages cell lines were seeded (5 × 10 4 cells/well) in 96-well flat-bottom microplates with 100 μL of RPMI 1640 medium. The cells were allowed to attach for 24 h at 37°C and 5% CO 2 , and the medium was replaced by different concentrations of the compounds in 200 μL of medium and exposed for another 24 h. Growth controls and signal-to-noise were also included. Afterward, a volume of 20 μL of 2.5 mM resazurin solution was added, and plates were returned to the incubator for another 3 h to evaluate cell viability. The reduction of resazurin was determined by fluorometry as in the promastigote assay. Each concentration was assayed three times. The cytotoxicity effect of compounds was defined as the 50% reduction of cell viability of treated culture cells with respect to untreated culture (CC 50 ) and was calculated using a multinomial probit analysis incorporated in SPSS software v21.0.

SOFT.PTML Model.
As mentioned in the introduction, the IFPTML algorithm is useful for finding predictive models for multiobjective optimization of compounds. In fact, we have already used IFPTML models for the study of new pyrroloisoquinolines vs different Leishmania species. 16 However, all steps of IFPTML analysis needed to be performed on different software and/or using different manual operations. Therefore, we decided to use our SOFT.PTML software for the development of IFPTML models for prediction of antileishmanial compounds.
The same dataset 16 containing n = 109,389 preclinical assays was selected and re-processed with the SOFT.PTML software. Figure 4 shows the user-friendly interface of the software with the IF, PT, and ML stages integrated in a single application, which allowed exploring different ML techniques in a more automatic way. Specifically, logistic regression (LOGR), support vector machine (SVM), and random forests (RF) algorithms were studied. 31−33 Table 1 summarizes the results obtained with the different algorithms (see the Supporting Information 1 for details of the dataset used and detailed results of the model for each case).
In this study, a total of 145,851 cases (pre-clinical assays outcomes) distributed in 109,389 cases in training series and 36,462 cases in validation series have been analyzed. The sensitivity (Sn) and specificity (Sp) values obtained from these algorithms were studied using the IFPTML strategy both in the However, the IFPTML-SVM model was discarded because it had a very low value of Sn = 0.6636 in validation series. The IFPTML-RF model was also discarded because although the results were very promising (Sp ≈ Sn = 0.8−0.98 range), the model itself is markedly more complex than the linear models. Therefore, the linear model IFPTML-LOGR was selected as the most appropriate based on the Ocam's razor or parsimony principle. 34,35 The equation of the IFPTML-LOGR model is the following (eq 2):   PTO ki (D ki , c j ) variables in both models is exactly constant = 2.0. This indicates that, except for a scale factor of 2, both equations give equal weight to the different variables and should give similar results. In fact, we found a correlation coefficient of R = 0.98 for the f(v ij ) calc values obtained with eq 2 vs eq 3. This result demonstrates that the all-in-one strategy implemented in SOFT.PTML is capable of reproducing the results obtained with the multisoftware strategy, using a single program with a user-friendly interface, which makes the work notably easier and faster.

Computational and Experimental Study of 2-Acylpyrroles.
A case study is presented to illustrate the use of SOFT.PTML models for discovery of antileishmanial compounds in practice. As stated before, we focused on 2acylpyrroles 5a and 5b, whose synthesis has been previously reported by us, 15 because they combined structural features of related pyrrole derivatives 11−14 with promising antileishmanial activity. To our knowledge, no previous studies on their antileishmanial activity have been reported. We describe herein the in vitro assays and in-depth computational screening of these compounds. First, the synthesis of new samples of 2-acylpyrroles 5a and 5b with pre-clinical assay quality was carried out. Next, these compounds were tested against two species of Leishmania in different development stages. These experimental studies included two biological activity parameters (IC 50 and CC 50 ) for two Leishmania species (L. amazonensis and L. donovani). Finally, the study was closed with a wide computational screening of these compounds vs many Leishmania species in different stages and multiple target proteins.

Preparative Organic Synthesis.
Our group has recently reported the synthesis of a variety of 2-(hetero)aroylpyrroles through a Pd(II)-catalyzed acylation of pyrrole with aldehydes 36 in the presence of an oxidant, using 2methylpyridinyl and 2-pyrimidyl as directing groups. 15 This radical C−H activation reaction 37−41 is a good catalytic alternative to classical acylation methods (Friedel−Crafts, Vilsmeier−Haack, or Houben−Hoesch type acylation reactions), which minimizes the production of waste as it does not require the use of stoichiometric amounts of Lewis or protic acids. Thus, we had demonstrated that the use of 2-pyrimidinyl directing group led to C-2 metalation of pyrrole using Pd(OAc) 2 as the pre-catalyst in toluene, which were acylated with aldehydes in the presence of TBHP as the oxidant and pivalic acid as the additive. The procedure could be efficiently extended to a series of aldehydes with different substitution patterns on the aromatic ring, obtaining 2-acylpyrroles 5aa−an (Scheme 1), though diacylation could not be completely avoided. However, under the same experimental conditions, the use of 3-methyl-2pyridinyl directing group led to the formation of 2-acylpyrrole derivatives 5ba−bs in moderate to good yields, except when electron-withdrawing substituents were present in the aromatic ring (Scheme 1).

Antileishmanial Activity Pre-Clinical
Assay. The 2-(hetero)aroylpyrrole derivatives 5a and 5b were tested against L. amazonensis and L. donovani, which are responsible for the two main clinical forms of this neglected tropical disease, cutaneous and visceral leishmaniasis, respectively (Table 2). We performed in vitro promastigote and in vitro intracellular amastigote susceptibility assays (IC 50 ) and cytotoxicity assays (CC 50 ) on the J774 cell line of macrophages using miltefosine as the drug of reference (see Materials and Methods), and the corresponding selectivity indexes (SI) were calculated. Detailed information of the biological activity of the more interesting compounds, including the compound code, concentration, repeated measures of biological activity, average values, can be found in the Supporting Information 2. This file also contains the graphic representations of dose−effect curves for these compounds and the drug of reference, miltefosine.
The performance of each N-pyrimidin-2-yl acylated pyrrole 5a was compared with that of the corresponding N-(3methylpyridin-2-yl) derivative 5b ( Table 2). The bioactivities of some compounds of both series compare well in terms of activity and selectivity against L. amazonensis promastigotes. The aromatic substitution pattern of the acyl group plays an important role in the antileishmanial activity of these pyrrole derivatives. In some cases, we observed similar trends in the bioactivity profile for pyrimidine derivatives 5a and the corresponding pyridines 5b. For example, the 4-t butylphenyl pyrrolyl methanones 5ac/5bc and the 3,5-disubstituted phenyl pyrrolyl methanones 5aj/5bj, with electron-donating (MeO) substituents, showed IC 50 in a similar micromolar range to miltefosine (Table 2, entries 3 vs 17 and 10 vs 22). The parallel behavior was maintained also for trisubstituted derivatives 5ak/ 5bk, which were both inactive under our bioassay conditions (Table 2, entries 11 vs 23). However, there were significant differences in the 2-(hetero)aroylpyrroles derivatives with halogenated aromatic rings. In particular, in the pyridine series, 5bd (R = F) was found to be more active and selective than the drug of reference (miltefosine) (IC 50 = 16.87 ± 0.73 μM, SI > 10.67), while the corresponding pyrimidine derivative 5ad (R = F) was inactive (Table 2, entry 17 vs entry 4). It also should be pointed out that compound 5an, where the phenyl ring had been changed to a naphthyl ring, showed similar activity to the drug of reference with better selectivity (Table 2, entry 14).
The same set of 2-(hetero)aroylpyrroles 5a,b was also tested on promastigotes forms of L. donovani (Table 2). All compounds were considerably less active and selective than miltefosine. Halogenated pyridine derivatives 5bd−5bf presented the best profiles, 5bd again being the most active and selective of all 2-acylpyrroles (IC 50 = 7.78 ± 0.27 μM and SI > 23. 15). However, it should be highlighted that all tested pyrrole derivatives were less toxic than miltefosine with values of concentration of the compound that produces 50% reduction of cell viability (cytotoxic concentration, CC 50 ) in the range 87− 401 μM in J774 cells. This is a promising result, taking into account high toxicity (low selectivity) of marketed available drugs. 3 Then, one compound of each series was further tested in vitro on L. amazonensis and L. donovani amastigotes (Table 3). Pyrimidine derivative 5bc showed good performance with an activity similar to miltefosine and better selectivity (IC 50 = 60.55 ± 7.88 μM, SI > 5.19) against L. amazonensis. Nevertheless, pyridine derivative 5bc presented bad results in terms of activity and selectivity (IC 50 = 153.27 ± 9.11 μM, SI > 1.99).

IFPTML-Based Computational Screening of New
Compounds. For this predictive study, we selected 28 compounds previously synthesized by our group (see structures on Scheme 1), whose in vitro biological activity (IC 50 values) vs two Leishmania species (L. donovani 42 and L. amazonensis 43 ) and cytotoxicity vs one cell line (J774 line of BALB/c mice macrophages 44 ) has been carried out (Tables 2 and 3). However, there are >20 clinically relevant Leishmania species, such as L. major, 45 L. mexicana, 46 L. aethiopica, 47 L. braziliensis, L. amazonensis, L. donovani, 48 L. infantum, 49 etc. Therefore, it could be very interesting to know (a) other parameters (K i , K m , etc.) of in vitro biological activity vs specific target proteins and (b) the cytotoxicity of these compounds vs other human and animal cell lines such as Jurkat, 50 Vero, 51 THP-1, 52 HEK293, 53 HeLa, 54 HL-60, 55 Sf9, 56 etc. Consequently, we decided to use our multioutput IFPTML model to perform an in-depth computational screening of the biological activity of these compounds in all the biological assays space. Thus, we ran a computational screening experiment involving calculation of 20,704 activity scores for 29 compounds (28 compounds + miltefosine as reference) in 647 different preclinical assays. These 647 preclinical assays of reference present unique combinations of the biological assay conditions c 0 = parameter (K i , IC 50  The following steps were performed. First, DRAGON software 25 was used to calculate the entries of the vector of molecular descriptors for each compound. Next, the values of the molecular descriptors D ki were substituted into the model, obtaining as output the scores of biological activity f(v ij ) calc for the ith compound in the jth assay. Finally, the scores of biological activity f(v ij ) calc were expressed in terms of relative deviation Δf(v ij )% calc = 100·[f(v ij ) calc − f(v mtfj ) calc ]/f(v mtfj ) calc . These relative scores express the deviation of the ith query compound from the reference, miltefosine (mtf). The predictions show, in general, a higher relative biological activity score Δf(v ij )% calc for the compound series 5b than for series 5a compared to the drug of reference (miltefosine). Specifically, compounds 5bc and 5bp show 1−4 fold higher relative biological activity score values compared to miltefosine. The prediction is consistent with our experimental findings for IC 50 activity assays vs L. amazonensis and L. donovani and selectivity index in the J774 cell line (Table 4). Table 4 summarizes the results of scores of biological activity f(v ij ) calc calculated for some of the 28 compounds (5am, 5af, 5bc, 5bd, and 5bp), considering the most relevant organisms, cell lines, and target proteins. Compound 5bd was predicted with a positive value Δf(v ij )% calc = 0.30 for IC 50 vs L. amazonensis promastigotes. It means that the model predicts this compound with high probability to be in the same range of activity than miltefosine. This result is in agreement with the experimental findings reported in the previous section, IC 50 = 16.87 μM for compound 5bd and IC 50 = 30.67 for miltefosine vs L. amazonensis promastigotes. Compound 5bc is also predicted with a positive value of Δf(v ij )% calc = 2.09 for IC 50 vs L. amazonensis promastigotes, which also matches with our experimental finding (IC 50 = 30.87 μM for compound 5bc vs IC 50 = 30.67 for miltefosine in L. amazonensis promastigotes assay). The same trend has been observed for other compounds (e.g., 5am) that were also predicted with positive Δf(v ij )% calc values approximately in the same range as miltefosine. Interestingly, compound 5af is predicted with values Δf(v ij )% calc lower than miltefosine (negative values of Δf(v ij )% calc ) both vs L. amazonensis and L. donovani promastigotes, as observed in the experimental results (see Table 2).
In addition, the scores of biological activity of these compounds vs different cell lines were predicted. For this series, the cutoff for the scores of biological activity was 1.62 and 50 for n j . Only one assay per cell line has been shown. First, we focused on the selectivity index (SI = ratio CC 50 /IC 50 ) of the compounds vs J774 cell lines because they were the experimental lines used. Specifically, compound 5bc has a value of Δf(v ij )% calc = 1.62 for cell line J774.A1, which means that this compound is expected to show a similar-to-higher probability than miltefosine of presenting positive SI, in agreement with our experimental results. In fact, compound 5bc was found to have a SI >10.17, which is approximately 6 times the value for miltefosine with SI = 1.80. The model was able to reproduce, in general, the trends on SI values for all the compounds of both series for cell lines J774 and/or J774.A1. Similar results of positive SI were predicted with the IFPTML model for other cell lines not experimentally tested here. The higher values were calculated for cell lines: L6, J774.A1, J774 THP-1, LLC-MK2, Vero, and HepG2. This points to these lines as interesting targets for further testing the safety of these compounds in the future.
Finally, those proteins with the higher increase in biological activity score from the reference miltefosine were selected. In addition, proteins were filtered by the number of assays (n j ) reported in the ChEMBL dataset for each protein. The proteins with higher n j are the most studied and probably the most relevant due to the increased attention they are receiving. To select the cases that include the most relevant proteins, the cutoff for the scores of biological activity was Δf(v ij )% calc = 1.62 and the cutoff for n j = 45. Only one assay per protein is shown. According to the results obtained with the IFPTML model, the most plausible target proteins are the following: vifunctional dihydrofolate reductase-thymidylate synthase (P07382), 57 glucose transporter (O61059), 58 solute carrier family 2 facilitated glucose transporter member 1 (P11166), 59 hexose transporter 1 (Q0GKD7), 60 farnesyl pyrophosphate synthase (O97467), 61 trypanothione reductase (P39050), 62 pyruvate kinase (Q27686), 63 pteridine reductase 1 (Q01782), 64 and methionine-tRNA ligase (E9BF75). 65 After a detailed inspection of all compounds active vs these proteins in our ChEMBL database, no significant similarity was found between the previously reported compounds and the 2-acylpyrrole derivatives tested here. Consequently, 2-acylpyrrole derivatives could be considered a new class of antileishmanial lead compounds that deserve further investigation. The detailed results of the computational study are released in Supporting Information 3.

CONCLUSIONS
We have shown that SOFT.PTML is a useful tool for developing predictive models for drug discovery. The software implements IFPTML algorithms in a user-friendly interface without the need to rely on multiple software to run the different stages (IF, PT, and ML) of the algorithm. More importantly, SOFT.PTML can process complex datasets with big data features (high volume, multiple outputs, multiple target proteins, cell lines, pathogen species, missing data, etc.). Specifically, the use of this software has been illustrated by processing a very large ChEMBL dataset (>145,000 cases) from preclinical assays against different Leishmania species. Among the different ML algorithms (SVM, LOGR, and RF) explored, the best model was IFPTML-LOGR, which estimates the probability with which multiple parameters (IC 50 , CC 50 , SI, etc.) of a new compound get to a desired level in pre-clinical assays with high specificity and sensitivity (80−98%) in both training and validation series. This result demonstrates that the all-in-one strategy implemented in SOFT.PTML is capable of reproducing the results obtained from the multisoftware strategy, using a single program with a user-friendly interface that makes the work noticeably easier and faster. The pre-clinical assays studied involve different Leishmania species and cell lines, as well as multiple target proteins. The use of the new tool has been illustrated in a practical case study, the 2-acylpyrrole derivatives. The in vitro evaluation of the leishmanicidal activity of 2-acylpyrrole series 5a and 5b against visceral (L. donovani) and cutaneous (L. amazonensis) leishmaniasis revealed that all tested 2-acylpyrroles showed very low cytotoxicity, CC 50 > 100 μg/mL in J774 cells (highest tested dose). This is an important feature as drug toxicity is one of the main limitations of current chemotherapy for leishmaniasis. In particular, 5bd (IC 50 = 16.87 μM, SI > 10.67) was approximately 6-fold more potent and selective than the drug of reference (miltefosine) in L. amazonensis promastigote assays. These results point to 2-acylpyrroles as a new class of lead compounds worthy of further optimization as antileishmanial hits. ■ ASSOCIATED CONTENT * sı Supporting Information curves for these compounds and the drug of reference,